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Abstract We point out a subtle error in the proof of Chrobak's theorem that every unary 
NFA can be represented as a union of arithmetic progressions that is at most quadratically 
large. We propose a correction for this and show how Martinez's polynomial time algorithm, 
which realizes Chrobak's theorem, can be made correct accordingly. We also show that 
Martinez's algorithm cannot be improved to have logarithmic space, unless L = NL. 
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1 Introduction 

A language is unary if it is defined over some unary (i.e. singleton) alphabet. Every unary 
language L can be faithfully represented as a set of natural numbers describing the lengths 
of the words in L (a.k.a. Parikh image of L). It is well-known (6) that a unary NFA can 
be determinized and converted to an exponentially large union of arithmetic progressions. 
It turns out that this exponential blow-up is avoidable. Chrobak [4] showed a fundamental 
theorem that every unary NFA has an equivalent one in some normal form, commonly called 
Chrobak normal form, which is at most quadratically large. In fact, NFAs in Chrobak normal 
form can easily be converted into an equivalent union of arithmetic progressions in linear 
time. However, Chrobak's original construction is non-algorithmic. Recently, there has been 
some effort to make Chrobak's construction algorithmic. This has resulted in Martinez's 
polynomial time algorithm 1911101 and Litow's quasi-polynomial time algorithm [7]. These 
results have been widely applied (e.g., see l2ll5ll Ml0l ). 

The goal of this paper is two-fold. First, we would like to point out a subtle error in 
Chrobak's proof in (4], which does not seem to be immediately obvious how to fix. The 
gist of the error is the incorrect assumption that in a directed graph with a fixed initial point 
and a fixed end point, one can traverse the cycles in an arbitrary way without taking into 
account the dependencies between them (see Example[TJbelow). Since the correctness proof 
of Martinez's algorithm relies on the correctness proof of Chrobak's construction (see flOl). 
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one can no longer assume that Martinez's algorithm is correct. Furthermore, it turns out that 
the same error also occurs in Litow's proof of correctness of his algorithm 1 7 1. Secondly, we 
would like to propose a correction for this error and show how Martinez's algorithm can be 
made correct accordingly. We will also make a simple observation that one cannot improve 
Martinez's algorithm to have logarithmic working space unless L = NL. 

Outline In Section[2]we pinpoint the location of the gap in Chrobak's proof. We show how 
to bridge this gap in Section [3] Section [4] shows how Martinez's algorithm can easily be 
fixed and proves that Martinez's algorithm cannot be improved to have logarithmic space 
unless L = NL. 

Notations We use standard notations from automata theory (see [6]). We shall not worry 
about which unary alphabet L to use since any unary language L over E is completely de- 
termined by the lengths of the words in L. A unary NFA stf is then a quadruple (Q, qo,8,F), 
where Q is a set of states, qo is an initial state, 8 C Q x Q is a transition function, and F 
is a set of final states. For q € Q, we use succ{q) to denote the set {q' : {q,q') € 8} of all 
successors of q in si '. We use N for {0, 1,2, . . .}. Given a,b £ N, an arithmetic progression 
with offset a and period b is the set a + £>N := {a + bk : k € N}. In the sequel, when we deal 
with computational problems we always represent numbers in unary. 

2 The error 

We first state the important lemma that is used in |4|. For a highly readable proof, see also 
the recent survey [11] on the Frobenius problem. 

Lemma 1 (|3 ,8|) Let < a\ < . . . <a^ < n be natural numbers. Then, ifX is the set of all 
x 6 N for which the linear equation a\x\ + . . . +fljt*A = x is solvable in natural numbers, 
then 

X = SU{a + bN) 

where S C N contains no numbers bigger than n 2 , and a is the least integer bigger than n 2 
that is a multiple ofb: = gcd(ai ,...,a^). 

A unary NFA = (Q, qo,8,F) is in Chrobak normal form if Q can be described as a union 
of {qo, . . . ,q m ] and the pairwise disjoint sets Ci,.. .,Q such that 

- for each 1 < i < k, Ci = { p,,o , Pi. i , . . . , Pi j. - 1 } for some integer j, > 0, 

- for each < i < m — 1, succ(q,) = {qi+i}, 

- succ(q m ) = {pifl,P2fl,---,Pkfl}, and 

- for every 1 < (' < k and < h < - 1, succ(p Lh ) = {p L{h+ \ mo d ./,)}• 

Figure[T]gives an example of an NFA in Chrobak normal form. In other words, an NFA srf is 
in Chrobak normal form if the structure of stf is a unique path from qo to the unique state q m 
at which £/ can make a nondeterministic choice to one of the few disjoint cycles of possibly 
different periods. Note that there can be more than one final states in F. Unary DFAs can be 
thought of as a subclass of NFAs in Chrobak normal form where q m has a unique succes- 
sor state or is a dead end. There is a simple linear-time procedure for converting NFAs in 
Chrobak normal form to a union of arithmetic progressions. For example, the NFA in Figure 
□is equivalent to the set (1 + ON) U (2 + ON) U (5 + 3N) U (5 +4N) U (6 + 4N) U (4 + 2N). 
Conversely, as offsets and periods in arithmetic progressions are represented in unary, there 
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Fig. 1 An automaton in Chrobak normal form. Filled circles are final states. 

is also a trivial linear-time algorithm that converts a union of arithmetic progressions to an 
NFA in Chrobak normal form. Simply use the largest offset as the length of the unique path, 
while selecting the final states approriately. Note that applying this procedure on preceed- 
ing example will give us an equivalent NFA that is different from one in Figure Q] There- 
fore, NFAs in Chrobak normal form and unions of arithmetic progressions are polynomially 
equivalent representations of unary regular languages. In the sequel, we shall use these two 
representations interchangeably. 

Theorem 1 (Chrobak |4|) For every n-state unary NFA, there is an equivalent NFA in 
Chrobak normal form with 0(n 2 ) states, the periods of whose cycles cannot exceed n. 

We now briefly sketch Chrobak's proof in [4 | and point out the the subtle error in the proof. 
In the sequel, a strongly connected component (SCC) is said to be trivial if it contains only 
a single node with no self-loop. An SCC is nontrivial if it is not trivial, i.e., either it is a 
single node with self-loop or it contains at least two nodes. Notice that in a nontrivial SCC 
for every two nodes v and v' there is always a nonempty path from v to v'. 

Proof Suppose that s/ = (Q,qo,8,F). If L(s/) = 0, then the theorem is obvious and so 
we shall assume that L(s/) ^ 0. Without loss of generality, we may then assume that F = 
{qF }, there is no incoming transition to go, and that qp is the unique state with no outgoing 
transition. Furthermore, we can assume that all states in Q can reach qf and can be reached 
from qo. Standard results in graph theory (see |T|) state that we can decompose the directed 
graph structure of s/ into a directed acyclic graph (DAG) G(sf) = (V,E) of SCCs in s/ 
where V and E, respectively, denote the vertices and edges of G(s/). More precisely, V is 
the set of all SCCs in s/, and that {D,D') 6 E iff si has a transition (q,q') £ 8 for some 
states q and q' in D and D' , respectively. Note that {go} is the unique node in G(s/) with no 
incoming arc (i.e. "root" of G(s/)) and {qp} is the unique node in G(sf) with no outgoing 
arc (i.e. "leaf" of G(s/)). A path a in G(s/) from qo to qp is called a superpath in sf. In 
the sequel, we will also think of a superpath as a subgraph of s/ induced by the SCCs in the 
superpath. 

For every superpath a in s/, let L a be the set of all lengths of paths in s/ from qo to qp 
that are in a. It follows that L(s/) = [J a L a , where the union ranges over all superpaths in 
si ' . Then, it suffices to show that each L a is a union of some arithmetic progressions a + bN 
such that if a < n 2 +n, then b = 0, and if a > n 2 + n, then a < n 2 + 2n and < b < n. For 
then, the number of possible such arithmetic progressions is 0(n 2 ). 

We shall not worry about superpaths a consisting only of trivial SCCs as x < n for every 
x £ L a . So, let us assume that a has at least a nontrivial SCC. Let < a\ < . . . < a„ < n 
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Fig. 2 A simple example of a superpath with cycle dependencies 

be the lengths of all simple cycles in a. Let x € L a and take a path R in a of length x. 
Then, x = xq + a\x\ + . . . + a p x p , where xq is the length of the path obtained from R by 
deleting cycles. Observe that xq < n. By LemmaQ] it follows that L a = L l a UL^, where L l a 
contains no numbers bigger than n 2 + n, and L 2 a is an arithmetic progression with period 
gcd(ai ,...,«„) and offset < n 2 + 2n. This completes the proof. □ 

Observe that the proof (last paragraph) tacitly assumes that the set X of solutions to the 
equation x = xq + a\x\ + . . . + a p x p corresponds to L a . Although it is the case that X3L a , 
it is not the case that X C L a in general, which is illustrated in Example [T] This incorrect 
assumption is also made in the proof of Lemma 5 in f7l. 

Example 1 Consider the superpath a given in Figure [2] Here a has only one nontrivial 
SCC in the middle. Observe that one cannot traverse the cycle of length 5,8, or 20 without 
traversing the cycle of length 7 at least once. Also, notice that one has to traverse at least 
once the cycle of length 8 or the cycle of length 20 before we traverse a cycle of length 5. In 
effect, there are some solutions to the equation x = 10 + 5x\ + 1x2 + 8x3 + 20x4 that are not 
in L a ; for instance, 15, 18, 20, and 30. 

3 How to fix the proof 

In this section, we show how to fill the gap explicated in Example Q] To do this, it suffices to 
show that each L a is a union of some arithmetic progressions a + fcN such that if a < 2n 2 + n, 
then b = 0, and if a > 2n 2 + n, then a < 2n 2 + 2n and < b < n. As before, the number 
of possible such arithmetic progressions is 0(n 2 ). In this section, we call such arithmetic 
progressions good. 

Let us take a superpath a with disjoint nontrivial SCCs D\,...,D r in some order with 
r > 1. Observe that there are only finitely (at most exponentially) many simple paths from 

qo to qp in a, say, CTj, , 0>. Define L a . to be the lengths of all (not necessarily simple) 

paths from qo to qp in a that can be shortened to (J, by deleting cycles. It is clear that 
La = U/=i L(j r So, it suffices to show that L Gj is a union of good arithmetic progressions. 

Consider one of the simple paths a, and suppose that Vj is the point where a, enters the 
SCC Dj. We shall take a shortest (not necessarily simple) cycle Cj at point Vj that visits each 
node in Dj at least once. The length of Cj cannot exceed |Z)j|(|Dy| — 1), where \Dj\ is the 
number of nodes in Dj. This is because the length of a shortest path from a node u to another 
node u' in Dj is at most \Dj \ — 1 . So, we may modify the path a, as follows: at each Vj before 
proceeding to the next node in a,, we shall first traverse the loop Cj which will take us back 
to Vj. This is illustrated in Figure[3] Clearly, as Zj =1 \Dj \ < n, the length of the resulting path 



5 




Fig. 3 The modified path o~/ inside the superpath a, zooming in on a particular SCC Dj. The dotted line is 
the original path ff,. The dashed line is the additional loop Cj. 

o\ is |ct,| +E r j=1 \Cj\ < n + E' j=l \Dj\ 2 < n + (E' j=1 \Dj\) 2 = n + n 2 . As ct/ visits each node 
in a at least once, the set X of of all x's for which x = |ct/| + a\x\ + . . . +a p x p is solvable 
in natural numbers is contained in L a . . By Lemma[TJ X = X\ U (g + fcN) , where X\ contains 
numbers no bigger than 2n 2 + n, and g is the least integer bigger than 2n 2 + n such that 
g = |rj/| (mod b) where b = gcd(ai , . . .,a p ). Notice that g < 2n 2 + 2n. Furthermore, every 
compound cycle is composed of simple cycles and so we see that \Cj\ is a linear combination 
of a\ , . . . , a p . This implies that b divides Zj =1 \Cj \ and so g = ct,- (mod b) . 

Conversely, the set X' of jc's for which the Diophantine equation x = |fJ,| + a\x\ + . . . + 
solvable in natural numbers contains L CTf as we already saw. Therefore, if m 6 L a . 
and m > 2n 2 +n, then Lemma [T] implies that m = |ct,| (mod b), where b = gcd(ai, . . . ,a p ). 
We already saw that g + Z?N C X and thus meX.In summary, we have L a . = K U X, where 
K contains numbers no bigger than 2n 2 +n. This completes the proof of Theorem[T] 



4 Martinez's algorithm 

Theorem 2 (|9,10 |) There is a poly-time algorithm converting a given unary NFA to an 
equivalent one in Chrobak normal form with at most quadratic blow-up. 

Martinez's poly-time algorithm l9l fT01 works by making sure that each step in Chrobak's 
proof in ID can be performed in poly-time. The correctness of Martinez's algorithm relies 
on the correctness of Chrobak's construction. Due to the subtle error, one has to adjust Mar- 
tinez's algorithm slightly, which fortunately is not hard to do. Nevertheless, for the sake of 
completeness, we describe the resulting algorithm. Furthermore, we later show that Mar- 
tinez's algorithm cannot be improved to have logarithmic working space unless L = NL. 

Martinez's algorithm consists of four basic steps. The first step is to ensure that the input 
NFA satisfies L(^/) ^ and is in the form required in Chrobak's proof (see above). [If 
L{srf) = 0, just output 0.] This can easily be achieved in poly-time by invoking a graph 
reachability algorithm appropriately several times. So we can write .e/ = (Q,qo, 8, 
Second, we apply Kosaraju's poly-time algorithm 1 1 ] to decompose into the DAG G(stf) 
of SCCs. Third, we compute gcd(D) for each nontrivial SCC D in ss/, where gcd(D) denotes 
the greatest common divisor of the lengths of all simple cycles in the SCC D. Fourth, using 
this information, we compute the desired union of arithmetic progressions. The only part 
requiring modification is the fourth step. 

Assuming that G{jrf) has been computed, we shall describe the third step of Martinez's 
algorithm. Let us view an SCC D as an adjacency matrix Mq. By performing simple matrix 
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multiplications, we compute M^.M^, . . - ,M D . We can easily obtain all j's such that the di- 
agonal of M J D contains some nonzero entry. Call them ji,...,j r . These numbers correspond 
to the lengths of (not necessarily simple) cycles in D of length at most \D\ . Then, it is the case 
that gcdfji, . . .,j r ) = gcd(D). To see this, first observe that gcdfj'i, . . .,j r ) divides gcd(D) 
since if / is the length of a simple cycle of D, then it has to be one of ji,..., j r - Conversely, 
if ji is the length of a compound cycle in D, then 7, can be written as a linear combination of 
the lengths of simple cycles in D. Therefore, gcd(D) also divides gcd(j'i , . . . , j r ). As matrix 
multiplications and gcd(ji,...,y r ) are poly-time computable, we can compute gcd(D) in 
poly-time. 

Before describing the fourth and final step of Martinez's algorithm, we shall recall an 
important idea from Chrobak's original proof in |4]. For every nontrivial SCC D in the 
given NFA — (Q,S,qo,{qF}), we let TI(D) be the set of all superpaths from qo to qw 
whose last nontrivial SCC (i.e. closest to qp ) is D. Also, let TIq be the set of all superpaths 
with no nontrivial SCCs. Using the notations in the proof above, we clearly have L(&/) = 
\Jaen L<x U Ud Uaen(D) La, where D ranges over all nontrivial SCCs in si ' . As we saw 
before, Uaen ^a contains no numbers bigger than n. In the following, we use gcd(a) to 
denote the greatest common divisor of the lengths of all simple cycles in a. 

Lemma 2 For every nontrivial SCC D in with d := gcd(D), the set {x £ (Jaen(D) L a :x> 
2n 2 + n} is a union of some arithmetic progressions a + t/N such that 2n 2 +n <a< 2n 2 + 3«. 

Proof It suffices to show that, for each a € 11(D) with g := gcd(ce), the set {x e L a : x > 
2n 2 + n) is a union of some arithmetic progressions a + dN such that 2n 2 +n <a < 2n 2 + 3n. 
From the proof of Theorem Q] above, it follows that the set {x 6 L a : x > 2n 2 +n} is a union 
of some arithmetic progressions a + gN such that 2n 2 + n < a < 2n 2 + 2n. All simple cycles 
in D are also simple cycles of a (as D is a subgraph of a) and so g divides d. Thus we may 
simply replace each of these a + with a union of the following arithmetic progressions 
a + JN, (a + g) +dN, (a + 2g) +</N, ...,(a + d-g) +dN. Since d < n, each offset a' in any 
of these arithmetic progression satisfies 2n 2 + n < a' < 2n 2 + 3n. □ 

We now describe the final step of Martinez's algorithm. For the sake of simplicity, 
we give a simplified version of Martinez's algorithm that is is slightly less efficient, al- 
beit still poly-time (see Martinez's papers [9,10] for how to avoid recomputations). As 
aen La U UflUaentDj^a with D ranging over all nontrivial SCCs in s4 , our 
strategy is as follows: (i) compute all numbers 6 L(g/) that are at most 2n 2 + n, and (ii) 
compute a union of arithmetic progressions for each Uaen(D)^« according to Theorem [2] 
Step (i) is quite easily achieved as we only need to use the standard linear-time algorithm 
for the membership problem for NFAs. The following lemma will be used in step (ii). 

Lemma 3 Given a nontrivial SCC D and an arithmetic progression a + bN with 2n 2 + n < 
a < 2n 2 + 3n and b = gcd(Z)), the following statements are equivalent: 

1. a+i)NC^6 UaeTI(D) L a -x> 2n 2 +n). 

2. There exists a path from qo to qp of length a that is contained in some superpath a 6 

n(D). 

3. There exists a path from q$ to qf of length a whose last vertex belonging to a nontrivial 
SCC is in D. 

Proof Statement (2) and (3) are equivalent by definition. It is obvious that (1) implies (2). 
Conversely, (2) implies that a £ So, where So '■= {x 6 \J a en(D) L a 'x> 2n 2 +n}. By Lemma 
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Fig. 4 The left graph is sf and the right graph is sf D . 

[2] there exists an arithmetic progression a! + C S D with 2n 2 +«<«'< 2w 2 + 3n such that 
a 6 a' + £>N. It follows that a = a' + for some ieN and so a + bN C a' + fcN C □ 

By Lemma [3j we shall run through all nontrivial SCC D in s/ and collect all arithmetic 
progressions a + bN with 2n 2 + n < a < 2n 2 + 3n and b = gcd(D) satisfying statement 3 of 
Lemma[3] Checking statement 3 of Lemma[3]can be done as follows. Let Reach(D) be the 
set of all states reachable from D in s/. Therefore, for each nontrivial SCC D, we create a 
new copy of where we remove all nontrivial SCCs D' with D' =£D that are reachable 
from D and all transitions from some state v ^ Reach(D) to a state v' 6 Reach(D) — D. This 
is illustrated in Figure [4] The new directed graph s/o is computable in poly-time by easy 
applications of graph reachability algorithms. Observe that all paths from qo to qp in si 
whose last vertex belonging to a nontrivial SCC is in D coincide with all paths from qo to qp 
in sip. We thus need to check whether there is a path of length a from go to qp in s/q, which 
is doable in poly-time. Finally, it is easy to check that the entire algorithm runs in poly-time. 

We finally remark that a log-space transducer converting a unary NFA to an equivalent 
one in Chrobak normal form is unlikely to exist. To see this, suppose that such a transducer 
exists. Observe that there is a simple log-space transducer converting an NFA in Chrobak 
normal form to an equivalent union of arithmetic progressions. Therefore, combining this 
with our hypothetical log-space transducer converting a unary NFA to an equivalent one in 
Chrobak normal form, we have a log-space reduction from the nonemptiness problem for 
unary NFAs to the nonemptiness problem for unions of arithmetic progressions. The first 
problem is equivalent under log-space reductions to graph reachability, which is complete 
for NL, while the second is obviously in L. Hence, we obtain that L = NL. This gives us the 
following theorem. 

Theorem 3 There is no log-space transducer converting a unary NFA to an equivalent one 
in Chrobak normal form, unless L = NL. 
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